For the last five years, natural language processing (NLP) has played a vital role in analyzing documents and conversations. Its business applications power everything from reviewing patent applications, summarizing and linking scientific papers, accelerating clinical trials, optimizing global supply chains, improving customer support, to recommending sports news. As the technology becomes widespread, much is evolving from enterprise investments in NLP software to common use cases.
According to research from last year, 2020 had an impact on business globally, but NLP was a bright spot for technology investments. That momentum has carried through to this year, but even still, we are just at the tip of the iceberg in terms of what NLP has to offer. As such, it’s important to keep tabs on how the industry is evolving, and new research from Gradient Flow achieves that.
The 2021 NLP Industry Survey Report analysis explores organizations with years of history deploying NLP applications in production compared to those just getting started, responses from Technical Leaders versus general practitioners, company size, scale of documents, and geographic regions. The contrast of respondents reflects a comprehensive image of the industry at large, as well as what’s to come.
Following the money is a sure way to keep the pulse on growth, and if finances are any indication, NLP is trending upward — and fast. Similar to last year, NLP budgets are increasing significantly; something that’s continued despite pandemic-driven IT spending setbacks. In fact, 60% of Tech Leaders indicated that their NLP budgets grew by at least 10%, a nearly 10% increase from last year (compared to 53% in 2020). Even more significant, 33% reported a 30% increase, and 15% said their budget more than doubled. This will only grow as the economy continues to stabilize.
As budgets grow — especially in mature organizations, classified as those that have had NLP in production for at least 2 years — there are several use cases driving the uptick in investments. More than half of Tech Leaders singled out named entity recognition (NER) as the primary use case for NLP, followed by document classification (46%). While this is not surprising, instances of more difficult tasks such as Question Answering are moving to the forefront, showing the potential of NLP to become more user-friendly over the next several years.
Not surprisingly, the top three data sources for NLP projects are text fields in databases, files (PDFs, docx, etc.), and online content. Progress is being made in extracting information from all of these data sources. For example, in the healthcare industry, using NLP to extract and normalize a patient’s history, diagnoses, labs, procedures, social determinants, and treatment plan is repeatedly proving useful in improving diagnosis and care. Adding information obtained from natural language documents and notes to other data sources – like structured data or medical imaging – is valuable for providing a more comprehensive picture of each patient.
The healthcare and pharmaceutical industries have been on the forefront of artificial intelligence (AI) and NLP, so their use cases vary slightly from overall industry practices. This is why entity linking / knowledge graphs (41%) and de-identification (39%), along with NER and document classification, were among the top use cases, typical of a highly regulated industry. Financial services is another area that’s gaining traction, as NLP has the ability to parse textual data, while understanding the nuances of industry jargon, numbers, different currencies, and company names and products.
NLP technology doesn’t come without challenges. Accuracy remains a top priority among all NLP practitioners, and with the need to constantly update and tune models, the barriers to entry are still high. Many companies are reliant on data scientists to build models and prevent them from degrading over time. Others default to cloud services, and report that they can be expensive, require data sharing with the cloud provider, and often make it hard or impossible to tune models (hurting accuracy). While 83% of respondents indicated they use at least one of the major cloud providers for NLP, this is mostly done in addition to also using NLP libraries. Tech Leaders cited difficulty tuning models and cost as primary challenges with cloud NLP services.
Fortunately, more tools are becoming widely available to even the playing field. Libraries like Spark NLP, the most popular library among survey respondents and currently used by 53% of the healthcare industry, is democratizing NLP through free offerings, pre-trained models, and no data-sharing requirements. NLP libraries popular within the Python ecosystem — Hugging Face, spaCy, Natural Language Toolkit (NLTK), Gensim, and Flair — are also being used by a majority of practitioners.
Between the numerous NLP libraries and cloud services, growing investments, and innovative new use cases, there is certainly cause to get excited about what’s next for NLP. By tracking and understanding the common practices and roadblocks that exist, we can apply these lessons across the AI industry, and keep moving the recent research breakthroughs into real-world systems that put them to good use.